Representing Pattern Matching Algorithms by Polynomial-Size Automata
نویسندگان
چکیده
Pattern matching algorithms to find exact occurrences of a pattern S ∈ Σ in a text T ∈ Σ have been analyzed extensively with respect to asymptotic best, worst, and average case runtime. For more detailed analyses, the number of text character accesses X n performed by an algorithm A when searching a random text of length n for a fixed pattern S has been considered. Constructing a state space and corresponding transition rules (e.g. in a Markov chain) that reflect the behavior of a pattern matching algorithm is a key step in existing analyses of X n in both the asymptotic (n → ∞) and the non-asymptotic regime. The size of this state space is hence a crucial parameter for such analyses. In this paper, we introduce a general methodology to construct corresponding state spaces and demonstrate that it applies to a wide range of algorithms, including Boyer-Moore (BM), Boyer-Moore-Horspool (BMH), Backward Oracle Matching (BOM), and Backward (Non-Deterministic) DAWG Matching (B(N)DM). In all cases except BOM, our method leads to state spaces of size O(m) for pattern length m, a result that has previously only been obtained for BMH. In all other cases, only state spaces with size exponential in m had been reported. Our results immediately imply an algorithm to compute the distribution of X n for fixed S, fixed n, and A ∈ {BM,BMH,B(N)DM} in polynomial time for a very general class of random text models.
منابع مشابه
About the Size of Boyer-moore Automata
We study the size of Boyer-Moore automata introduced in Knuth, Morris & Pratt's famous paper on pattern matching. We experimentally exhibit a nite class of binary patterns, which produce large Boyer-Moore automata. The best approximation curve for their sizes is a polynomial O(m 7), or even an exponential O(2 0:4m), in the length m of the patterns. All the previously known maximal sizes were at...
متن کاملExact Analysis of Pattern Matching Algorithms with Probabilistic Arithmetic Automata
We propose a framework for the exact probabilistic analysis of window-based pattern matching algorithms, such as Boyer-Moore, Horspool, Backward DAWG Matching, Backward Oracle Matching, and more. In particular, we show how to efficiently obtain the distribution of such an algorithm’s running time cost for any given pattern in a random text model, which can be quite general, from simple uniform ...
متن کاملAn Algorithm to Compute the Character Access Count Distribution for Pattern Matching Algorithms
We propose a framework for the exact probabilistic analysis of window-based pattern matching algorithms, such as Boyer–Moore, Horspool, Backward DAWG Matching, Backward Oracle Matching, and more. In particular, we develop an algorithm that efficiently computes the distribution of a pattern matching algorithm’s running time cost (such as the number of text character accesses) for any given patte...
متن کاملThe Compression of Subsegments ofImages
We investigate how the size of the compressed version of a 2-dimensional image changes when we cut o a part of it, e.g. extracting a photo of one person from a photo of a group of people. 2-dimensional compression is considered in terms of nite automata. Let n be the size of the smallest acyclic automaton which describes an image T . We show that the tight bound for the compression size of a su...
متن کاملOn the Synthesis of Strategies in Infinite Games
Completeness and Weak Completeness Under Polynomial-Size Circuits p. 26 Communication Complexity of Key Agreement on Small Ranges p. 38 Pseudorandom Generators and the Frequency of Simplicity p. 50 Classes of Bounded Counting Type and their Inclusion Relations p. 60 Lower Bounds for Depth-Three Circuits With Equals and Mod-Gates p. 71 On Realizing Iterated Multiplication by Small Depth Threshol...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1607.00138 شماره
صفحات -
تاریخ انتشار 2016